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Twenty-six structural variables vjere defined and 
investigated using a set of algebraic word problems solved by 96 
college students. The study attempted to identify a small independent 
well-defined set of arithmetic^ linguistic^ and algebraic structural 
variables which account for a maximum amoxint of the variance of the 
observed probability correct of algebra word problems. Findings 
showed that one linguistic variable, two algebraic variables, and 
three arithmetic variables entered in the first six steps of a 
stepwise linear regression. Five of the six variables had significant 
t-values at the .05 level or lower. Six structural variables defined 
in terms of the number of words in the largest sentence, the logical 
transitivity of the unknowns, the recall of formulas^ the number of 
digits in quotients, the number of transpositions, and the type of 
arithmetic operations seem to account fo^: a large amount of the 
variance (R squared = .80) of the observed probability correct of 
algebra word problems. (Author/DT) 
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Variables That Contribute to Problem Solving Difficulty 
in Algebra Word Problems 



by 



Blair Cook 
The Pennsylvania State University 



Twenty-six structural variables were dei^ined and investigated using 
a set of algebraic word problems solved by college students. The study 
attempted to identify a small independent well-defined set of arithmetic, 
linguistic, and algebraic structural variables which account for a maximum 
amount of the variance of the observed probability correct of algebra word 
problems. The study found one linguistic variable, two algebraic vari- 
ables, and three arithmetic variables entered in the first six steps of a 
stepwise linear regression. Five of the six variables had significant 
t-values at the .05 level or lower. 

Several structural variables that were found to be robust in 
studies in the elementary grades with arithmetic word problems were found 
to be robust in the present study. Six structural variables defined in 
terms of the number of words in the largest sentence, the logical 
transitivity of the unknowns, the recall of formulas, the number of digits 
in quotients, the number of transpositions, and the type of arithmetic 
operations seem to account for a large amount of the variance (R^ = .80) of 
the observed probability correct of algebra word problems. 
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An Analysis of Arithmetic, Linguistic, and Algebraic Structural 
Variables That Contribute to Problem Solving Difficulty 
in Algebra Word Problems 



by 

Blair Cook 
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Annual Meeting, February, 1973 
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Several studies have investigated structural and linguistic vari- 
ables (Suppes, Jerman, and Brian, 1968? Suppes, Loftus, and Jerman, 1969; 
Loftus, 1970; Jerman, 1971; Jerman and Rees, 1972; Jerman and Mirman, 1972; 
Jerman, 1972; Krushinski, 1973). The studies using structural variables 
attempted to account for the variance in the observed probability correct 
of arithmetic word problems using a stepwise linear regression. An under- 
lying purpose of structural variable studies was to identify a small set 
(about six) of independent well-defined structural variables that could be 
used in the generation of word problems of a predictable level of diffi- 
culty. 

In an arithmetic word problem structural variable study Jerman 
(1972) suggested that; a structural variable study should be conducted "on 
an entirely different set of problems." In the same study Jerman also pro- 
posed that a similar investigation be made in the upper grades. 

Hence, the purpose of the present study was to identify a small 
independent well-defined set of arithmetic, linguistic, and algebraic 
structural variables which account for the observed probability correct of 
algebra word problems with college students. 

Twenty-six structural variables were defined for the present study. 
Five arithmetic variables were selected for investigation from the Jerman 
(1972) study. They are defined as follows: 

1. RECALL . The sum of the following: 

(a) One count was given for each formula to be recalled. 

(b) One count was given for each step in the formula. 

(c) One count was given for each conversion to be recalled and 
used. 

(d) One count was given for each fact from a previous probler. 
to be recalled and used. 
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2. 0PER2 . The sum of the following: 

(a) One count was given for each different operation used. 

(b) Add four for one or more division operations. 

(c) Add two for one or more multiplication operations. 

(d) Add one for one or more audition operations. 

3. 0PER3. The sum of the following: 

(a) One count was given for each different operation used. 

(b) Add four for each division operation. 

(c) Add two for each multiplication operation. 

(d) Add one for each addition operation. 

4. QUO. One of the following: 



each quotient, 
(b) Zero, if division was not used. 

5. N0MC2. One count was given for each regrouping that occurred 
in each multiplication. 

Four well-defined independent variables were selected from 

Krushinski's study (1972). All linguistic variables are prefixed by the 

letters "LG" and they are defined as follows: 

6. LGWORD . One count was given for each word in the problem ■ 
statement. Numerals, e.g., 376.2, were given a count of one. 
Written expressions, e.g., thirty-two, were given a count of 
two. 

7. LGSENT. One count was given for each sentence. 

8. LGWDQU . One count was given for each word in the question 
sentence. 

9. LGPREP > One count was given for each preposition in the 
probl em statement. 

Four new linguistic variables were defined for the present study as 



10. LGMXST . One count was given for each word in the longest 
sentence of the problem statement. 

11. LGNUQU . One count was given for each numeral in the question 
sentence. 

12. LGiNMBR . One count was given for each numeral in the problem 
statement. 




count was given for each digit of 



f ol 1 ows : 
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t?J?iL. One count was qiven for each numerical relationsiiio 
statecT in the problem* 

Nine new algebraic variables were defined for the present study 
in terms of the equations expected to be used to solve each problem. Each 
of these nine algebraic variables was prefixed by the letters "EO". Three 
different forms of each algebraic equation were used for determining the 
count for the "EQ" variables- The three forips are defined as follows: 

Fonn (A). Unsimplified Equation: 

Example: J (X+2) + 6 (2X) = 43 

Form (B). Simplified Equation: The parentheses were removed. 

Example:?^. + ^ + 12X = 48 
4 4 

Form (C). Canonical Equation: The equation was written in 
canonical form. 

Example ^: ^ ^2X - = 0 
T ? 

Example 2: 0 = -^8 -li - A - 12X 

4 4 

The "EQ" algebraic variables were defined as follows: 

14. EOTRPZ, One count was given for each transposition required 
to isolate the terms that contained variables frcn the con- 
stant terms in the Form (B) equation. The count was taken 
before any combination of like terms. 

15. EOPARA. One count was given for each set of parentheses in 
the Form (A) equation. Parenthetical expressions preceded by 
+1 or -1 were not counted. 

16. EQXTRM > One count was given for each term that contained a 
variable in the Form (B) equation. 

17. EQTOP > One count was qiven for each indicated arithmetic 
operation in the Form (A) equation. 

18. EOCHAR, One count was qiven for each alphanumeric character 
in the Form (A) equation. One count was aiven for each 
decimal point, sign, and left or right parenthesis. 
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19. EQSIGN > The minimum of the two counts defined as follows: 
TaTOne count was given for each positive term in the Form (C) 

equation. 

(b) One count was given for each negative term in the Form (C) 
equation. 

20. EQPTRM . One count was given for each term within parentheses 
in the Form (A) equation. Parenthetical expressions preceded 
by a +1 or -1 were not counted. 

21. EQDEC. One count was given for each decimal or fraction in 
the Form (B) equation. 

22. EQANS . One count was given for each answer required in the 
problem statement. 

Four additional algebraic variables were defined with respect to 
those factors that were involved in the writing of the equation; and often 
called the translation aspect of verbal problem solving- The four vari- 
ables defined below were prefixed with the letters "TR". 

23. TRTRAM . One count was given for each unknown that was used 
in the definition of another unknown. 

24. TRTRMS. One count was given for each term of each unknown 
defi ned. 

25. TRCPMT. One count was given if an unknown was defined as the 
complement of another unknown. 

26. TRUKNS . One count was given for each unknown defined. 

An example of the coding of variables for word problems is given 

below. 

Example: The second angle of a triangle is twice the first angle 
' of the triangle. The third angle is three times the 
second angle. Find the angles of the triangle. 

A. Translations : 

Let X = the first angle, 
Then 2X = the second angle, 
and 3(2X) = the third angle. 

B. Forms: 

X + 2X + 3(2X) = 180 Form (A) 
X + 2X + 6X = 180 Form (B) 

X + 2X + 6X - 180 = 0 Form (C) 
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C. Solution: 

X + 2X + 3(2X) = 180 
X + 2X + 6X = 180 
9X = 180 
X = 180/9 
Ans. X = 20 
Ans. 2X = 40 
Ans. 3(2X) = 3(40) = 120 

D. Variable Coding: 



Variable 
Number 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


Value 


2 


10 


16 


2 


1 


29 


3 


6 


3 


14 


0 


2 


2 




Variable 
Number 


14 


15 


16 


17 


18 


19 


20 


21 


22 


23 


24 


25 


26 


Value 


0 


1 


3 


5 


13 


1 


1 


0 


3 


2 


3 


0 


3 



Method 

Ninety-six students enrolled in the Introduction to Algebra Course 
(MATH 4) at The Pennsylvania State University participated in the present 
study. The course v/as organized to permit each student to progress at an 
individual rate through each of 20 instructional units. The students took 
a computer-generated paper and pencil posttest at the end of each unit. 
The problems for each student's test were randomly selected from a file of 
problems prepared for each unit. 

Unit tests 9, 13, and 14 were selected for the purposes of this 
study. Units 9, 13, and 14 contained word problems whose solutions 
involved first-degree equations in one unknown. 

ERLC 



The 23 word probloms selected for this study were those Droblems 
that at least five stucents attempted. The probleci set consisted of one 
consecutive integer proble:n, two distance problems, three aqe problems, 
four angles of triangle proble:ns, four direct variation oroblems, and 
seven miscellaneous problems. 

Three direct variation problems had a percent correct of 100. 
The fourth direct variation problem had a percent correct of 80. Four- 
teen of the 25 structural variables had identical values for each of the 
four direct variation problems. The direct variation problems required 
evaluation of formula skills rather than the skills more common to the 
other problems selected, e.q. formulation of equations and solution of 
equations. Therefore, the four direct variation problems, the only four 
problems selected from unit 14, were eliminated from the oroblem set 
investigated in the present study. 

A stepwise linear regression program, a modified version of 
BMD02R (UCLA), was used to obtain regression coefficients, multiple 
Cfrrelations R and R . A clear explanation of the use of stepwise linear 
regression can be found in Spurr and Bonini (1967). A detailed explana- 
tion of the use of stepwise linear regression models with structural 
variables in problem solving research can be found in Suppes, Jerman, and 
Brian (1968) and Suppes, Loftus, and Jerman (1969). 

Results 

The mean percent correct for the 24 word problems was 60.74 and 
the standard deviation v/as 23.94. 



The variables which entered the first 12 steps in the stepwise 

regression for the problem set are presented in Table K The multiple R, 

2 2 
R , and increase in R is given. 



Insert Table 1 About Here 



An approximate indication of the goodness of fit of the regression 
line was given by the multiple correlation coefficient, R, and R , which 
was an estimate of the amount of variance accounted for by the regression 
model • In the present study at step six about 80 percent of the variance 
was accounted for by the model; at step 12 about 94 percent of the variance 
was accounted for by the model • 

Two translation variables, three linguistic variables, three 
arithmetic variables, and four equation variables entered on the first 12 
steps in the stepwise linear regression. 

The regression coefficient, standard error, computed t-value, and 
partial correlation coefficient for each of the first 12 variables to enter 
the stepwise regression are presented in Table 2, 



Insert Table 2 About Here 



The variable LGMXST, the number of words in the longest sentence, 
had the highest correlation with the observed probability correct of the 
word problem set. The variables EQPARA, LTTRAN, EQTOP, and LGREL were also 



good predictors of the observed probability correct for the problem set. 
The variable RECALL entered tne stepwise, linear reqression on the third 
step with a negative reqression coefficient and a negative partial 
correlation coefficient. 

The QUO variable's computed t-value was significant at the .001 
level, but tha OUG variable also had the lowest partial correlation 
coefficient. The variaoles LTTRAN, RECALL, EOTRPZ, and EQPARA were 
significant at the .01 level. Significant at the .05 level were the 
variables LGMXST, LTTRMS, EOXTRM, and EQTOP. The variables 0PER2, LGREL, 
and LGilUGU were not significant at the .05 level. 

■Discussion 

As the introduction indicated, the underlying purpose of the 
structural variable research has been to identify about six indepen- 
dent well-defined structural variables which permit a reasonably accurate 
predictio.T of the observed probability correct of word problems. There- 
for2, the present study was primarily interested in those structural 
variables which entered in the first six steps of the stepwise linear 
regression. The data for the six variables which entered in steps 7-12 
of the stepwise regression were included for completeness. The dis- 
cussion is restricted to the variables which entered in the first six 
steps of the regression. One linguistic variable (LGMXST), one trans- 
lation variable (LTTRAN), one equation variable (EOTRPZ), and three 
arithnetic variables (RECALL, OUO, 0PER2) comprised the first six entries. 

The t-values of LGMXST, LTTRAN, RECALL, QUO, and EQTRPZ were signi- 
ficant at the .05 level or lower. The variable 0PER2 was not signifi- 
cant at the .05 level . 



The multiple R, R*^ values, and the significance of the computed 
t-values were quite encouraging. Other encouraging results were the entry 
of RECALL and QUO in the first six steps of the regression and the signi- 
ficance of their computed t-values. RECALL and QUO were robust variables 
in the research done using arithmetic word problems (Jerman, 1971; Jerman 
and Rees, 1972; Jerman and Mirman, 1972; Jerman, 1972). 

The sign of the partial correlation coefficient of the RECALL vari- 
able was negative. This negative correlation might imply that those 
algebraic v/ord problems involving the recall and use of a formula are 
easier. The recall of a formula might aid in the recognition of the 
problem type and the steps that should be taken to solve the algebra 
problem. 

Jerman and Mirman (1972) and Jerman (1972) found the 0PER3 vari- 
able, a weighted count on the number and type of operations necessary to 
solve a problem, entered in the first six steps of the stepwise linear 
regression; the 0PER2 variable, a weighted count on the type of operations 
necessary to solve a problem, did not enter in one of the first six steps. 
In the present study the opposite situation occurred. The 0PER2 variable 
entered in the first six steps but the 0PER3 variable did not. Even when 
the 0PER2 and 0PER3 variables were analyzed independently the result was 
unchanged. The relative contribution of 0PER2 versus 0PER3 in the predic- 
tion of the observed probability correct in word problems is still unclear. 
Structural variables defined in terms of the different types of operations 
and the number of operations seem to have a definite robustness in the 
studies using arithmetic and algebra word problems. The affect of weighting 
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the types of operations versus weighting the type and nuinber of operations 
should be investigated in future research. 

The linguistic variable in the present study labelled LGWORD, the 
number of words in the problem statement, entered in one of the first six 
steps of the stepwise linear regression in several previous studies (Loftus, 
1970; Jerman, 1971; Jerman and Mirman, 1972; Jerman and Rees, 1972; Jerman, 
1972). The variable LGWORD did not enter in one of the first twelve steps 
of the regression in the present study. The linguistic variable LGMXST, 
the number of words in the longest sentence of the problem statement, 
entered in the first step of the stepwise regression with a significant 
t-value in the present study. .The results of the present study, using 
LGMXST, and the previous studies cited, using LGWORD, indicate that 
algebraic and arithmetic word problems will be difficult to solve if they 
involve either lengthy sentences or lengthy problem statements, respec- 
tively. The relative contribution of lengthy sentences and problem state- 
ments to the variance of the observed probability correct of word problems 
should be systematically investigated in future studies. 

The translation variable TRTRAN, the number of different unknowns 
that were used in the definition of other unknowns, entered in the second 
step and made significant contribution to the regression equation. This 
seems to imply that algebra word problems which involve a logical 
transitivity among the unknowns will be relatively more difficult to solve. 
On the basis of the present study a problem in which the third and second 
angles of a triangle are directly defined in terms of the first angle will 
be easier to solve than a problem in which the third angle is defined in 
terms of the second angle which is then defined in terms of the first angle. 
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The equation variable EQTRPZ, the nunber of transpositions 
required, entered in the fifth step of the regression and was significant 
It appears that the nujnber of transpositions is an important factor that 
varies directly with the difficulty of the algebraic word problems. 

In terms of the present study the structural variables L6MXST. 
LTTRAfi, RECALL, QUO, EQTRPZ, 0PER2 were the most important in accounting 
for a maximum amount of the variance of the observed probability correct 
of algebraic word problems attempted by colleqe students in an intro- 
ductory alqebra course. 

The limitations of the study must be recognized. The subjects 
were colleqe students- and sane word problems were attempted by as few 
as five students. This study was an initial investigation to determine 
some structural variables that show promise for further refinement and 
research, and as such, the study fulfilled its role. A main objective 
for future studies is to increase considerably the number of students 
who attempt the problems. The writer v/ould like to see -some future 
investiqations in secondary school algebra classes. Also, the variables 
should be tested usina an entirely different set of alqebra word problems 
Research should be conducted to analyze the effect of systematically 
varying the definitions of some structural variables. If the structural 
variables prove to be robust, a set of problens should be generated that 
contain several predicted levels of difficulty and tested with students. 
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TABLE 1 

Order "of Entry, R, R^, Increase in R' 
for the 24 Problems 



Step 


Variable 
Name 


Variable 
Number 


R 


r2 


Increase 
in R"^ 


1 


LGMXST 


10 


vJ • otto 




n one o 


2 


TRTRAN 


23 






r\ OOOO 
vJ. CCOC 


3 


RECALL 


1 


0.7788 


0.6066 


0.0819 


4 


QUO 


4 


0.8187 


0.6703 


0.0637 


5 


EQTRPZ 


14 


0.8764 


0.7682 


0.0978 


6 


0PER2 


2 


0.8967 


0.8040 


0.0358 


7 


EQPARA 


15 


0.9067 


0.8220 


0.0180 


8 


LGREL 


13 


0.9235 


0.8529 


0.0309 


9 


TRTRMS 


24 


0.9341 


0.8725 


0.0196 


10 


EQXTRM 


16 


0.9471 


0.8970 


0.0246 


11 


EQTOP 


17 


0.9660 


0.9331 


0.0361 


12 


LGNUQU 


11 


0.9670 


0.9350 


0.0019 
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Table 2 

Regression Coefficients, Standard Errors, of Regression Coefficients, 
Computed T-Values, and Partial Correlation Coefficients 
for the Problem Set 



Step 


Variable 


Regression 
Coefficient 


Standard 
Error 


Computed 
T-Value 


Partial 
Correlation 
Coefficient 


1 


L6MXST 


0.08722 


0.02862 


3.047* 


0.5.44 


2 


TRTRAN 


2.30835 


0.56626 


4.076** 


0.468 


3 


RECALL 


-0.25610 


0.07355 


-3.482** 


-0.290 


4 


QUO 


0.56031 


0.11833 


4.735*** 


0.027 


5 


LQTRPZ 


0.99023 


0.27612 


3.586** 


0.293 


6 


0PER2 


0.10931 


0.07437 


1.470 


0.184 


7 


EQPARA 


1.50927 


0.34799 


4.337** 


0.487 


8 


L6REL 


-0.35388 


0.21778 


-1.625 


0.416 


9 


TRTRMS 


-0.65485 


0.24555 


-2.667* 


0.328 


10 


EQXTRM 


1.15027 


0.37336 


3.081* 


0.311 


11 


EQTOP 


-0.32224 


0.12742 


-2.529* 


0.436 


12 


L6NUQU 


-0.05801 


0.10201 


-0.569 


0.177 




C = 


-6.09106 









*p <.05 
**p <.01 
***p <.001 



